Analytical Visualization on the History of Himalyan Expeditions
INFO 526 - Summer 2024 - Final Project
Abstract
The Himalayan Database, a meticulous archive originating from the pioneering work of Elizabeth Hawley, provides an invaluable resource for understanding the history of mountaineering in the Nepal Himalaya. This project leverages a subset of this comprehensive database, specifically focusing on expedition data recorded between 2020 and 2024, to analyze key trends and influential factors in contemporary Himalayan climbing. Utilizing two tidy tibbles, ‘peaks’ and ‘expeditions,’ we investigate patterns related to seasonality, success rates, and national participation within this recent timeframe. Furthermore, the analysis delves into the relationships between critical expedition choices, such as selected routes and agency affiliations, and their impact on expedition outcomes, including success probabilities and fatality risks. By examining these variables across diverse nationalities and temporal periods within this focused dataset, this study aims to contribute a deeper, data-driven understanding of the multifaceted elements influencing mountaineering endeavors in the challenging Himalayan environment.
Introduction
Mountaineering in the Nepal Himalaya represents one of humanity’s most profound engagements with extreme natural environments, characterized by unparalleled challenges and breathtaking achievements. Understanding the dynamics and outcomes of these expeditions is crucial for both historical context and future endeavors. This project embarks on an exploratory analysis of a specific segment of the rich historical data encapsulated within The Himalayan Database, an enduring legacy of Elizabeth Hawley’s dedicated efforts to document every facet of Himalayan climbing history. Originally compiled from a vast array of sources and made freely available online since 2017, the full database serves as a cornerstone for mountaineering research.
Our study specifically focuses on intriguing patterns and insights derived from mountaineering expeditions undertaken in the Nepal Himalaya during the years 2020 to 2024. By analyzing this extensive, yet focused, dataset of Himalayan climbs (structured into ‘peaks’ and ‘expeditions’ tibbles) this project seeks to uncover significant relationships between the strategic choices climbers make, such as their selected routes and expedition agencies, and their chances of achieving success or facing the tragic risk of fatalities. The analysis particularly aims to shed light on how these critical factors vary across different nations and evolving time periods within this contemporary five-year window. Through this focused exploration of a recent subset of The Himalayan Database, we aspire to offer a deeper, data-driven understanding of what influences expedition outcomes in one of the world’s most challenging and captivating mountaineering environments.
Question 1
Are Certain Routes Favored by Expeditions from Particular Nations, and Do They Have Disparate Success Rates?
Introduction
This section initiates our investigation into the strategic choices made by mountaineering expeditions in the Nepal Himalaya and their correlation with expedition outcomes. Specifically, we aim to uncover whether the selection of particular climbing routes is influenced by the nationality of the expedition team. Furthermore, we will explore if these nationally-favored routes demonstrate significantly different success rates, potentially highlighting variations in national climbing philosophies, accumulated experience on specific routes, or inherent disparities in route difficulty. A crucial aspect of our analysis involves truncating low-volume attempts. This methodological decision is made to avoid the undue influence of statistical outliers that could arise from a small number of expeditions on a given route, ensuring that our insights are derived from more statistically robust patterns.
Approach
Our analytical approach to address this question is structured into three main phases:
1. Peak-Specific Success Rate Calibration: To establish a foundational understanding, we first calibrated the general success rates for the top four most frequently attempted peaks within our 2020-2024 expeditions dataset. This step provides a broad overview of expedition success for these prominent summits, setting a comparative context for the more granular route-specific analysis.
2. Visualization of Route Success by Nation: Following the general calibration, we proceeded to visualize the success percentages of popular chosen routes for each of these four peaks, broken down by the participating nations. A bubble chart was employed for this visualization. In these charts, the size of each bubble represents the volume of attempts by a specific nation on a particular route, while its position indicates the corresponding success percentage. This allows for a clear, intuitive representation of both the popularity and efficacy of routes across different national teams.
3. Integrated Visualization and Interpretation: To facilitate a comprehensive comparative analysis and derive overarching interpretations, the individual bubble charts for all four peaks were combined into a single, integrated visualization. This unified view enables a direct comparison of route preferences and success rate disparities across multiple popular peaks and diverse national expedition teams, offering deeper insights into the interplay of national origin, route choice, and expedition success in the challenging Himalayan environment.
Analysis
general_census <- ggplot(summary_data, aes(x = reorder(pkname, -attempts), y = success_rate, fill = attempts)) +
geom_col() +
geom_text(aes(label = success_rate_label),
vjust = -0.5, # Position above bars
color = "black") +
scale_fill_viridis_c(
option = "viridis",
name = "Number of Attempts"
) +
coord_cartesian(ylim = c(0, 100)) + # Success rate is in percent (0-100)
labs(
x = "Peaks",
y = "Success Rate (%)",
title = "Success Rate of Top 4 Peaks Attempted by all Nations",
subtitle = "Bar height indicates success rate; color indicates attempts",
caption = "Source: https://github.com/rfordatascience/tidytuesday"
) +
theme_minimal(base_size = 14)p1 <- ggplot(ever_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.7,
position = "identity") +
geom_text_repel(data = subset(ever_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.7,
point.padding = 0.6,
min.segment.length = Inf) +
scale_color_viridis_c(option = "turbo",
name = "Attempts on Route",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = "none") +
scale_size_continuous(range = c(3, 15),
name = "Attempts on Route",
breaks = c(10, 20, 30),
limits = c(0, 40),
guide = "none")+
annotate("text", y = 125, x = 0.7, label = "Everest", size = 5, fontface = "bold") +
labs(x = NULL,
y = "Success Rate (%)") +
scale_y_continuous(breaks = seq(0, 100, by = 25)) +
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 125)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)p2 <- ggplot(amad_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.7,
position = position_jitter(width = 0)) +
geom_text_repel(data = subset(amad_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.6,
point.padding = 0.8,
min.segment.length = Inf) +
scale_color_viridis_c(option = "turbo",
name = "Attempts on Route",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = "none") +
scale_size_continuous(range = c(3, 15),
name = "Attempts on Route",
breaks = c(10, 20, 30),
limits = c(0, 40),
guide = "none") +
annotate("text", y = 120, x = 0.9, label = "Ama Dablam", size = 5, fontface = "bold") +
labs(x = NULL,
y = NULL) +
scale_y_continuous(breaks = seq(0, 100, by = 25)) + # Adjusted seq start to 0 for clarity
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 120)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)p3 <- ggplot(lhot_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.6,
position = "identity") +
geom_text_repel(data = subset(lhot_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.6,
point.padding = 0.8,
min.segment.length = Inf,
position = "identity") +
scale_color_viridis_c(option = "turbo",
name = "Attempts on Route",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = "none") +
scale_size_continuous(range = c(3, 15),
name = "Attempts on Route",
breaks = c(10, 20, 30),
limits = c(0, 40),
guide = "none")+
annotate("text", y = 130, x = 0.6, label = "Lhotse", size = 5, fontface = "bold") +
labs(x = "Route",
y = "Success Rate (%)") +
scale_y_continuous(breaks = seq(0, 100, by = 25)) +
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 130)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)p4 <- ggplot(mana_top3, aes(x = route, y = success_rate_on_route)) +
geom_point(aes(color = attempts_on_route_per_nation, size = attempts_on_route_per_nation),
alpha = 0.7,
position = "identity") +
geom_text_repel(data = subset(mana_top3, attempts_on_route_per_nation >= 1),
aes(label = nation, color = attempts_on_route_per_nation),
size = 5,
box.padding = 0.6,
point.padding = 0.8,
min.segment.length = Inf,
position = "identity") +
scale_color_viridis_c(option = "turbo",
name = "\n",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = guide_colorbar(direction = "horizontal", title.position = "top")) +
scale_size_continuous(range = c(3, 15),
name = " Attempts on route metrix (size + color)",
breaks = c(1, 10, 20, 30, 40),
limits = c(0, 40),
guide = guide_legend(title.position = "top"))+
annotate("text", y = 120, x = 0.55, label = "Manaslu", size = 5, fontface = "bold") +
labs (x = "Route",
y = NULL) +
scale_y_continuous(breaks = seq(0, 100, by = 25)) +
scale_x_discrete(labels = label_wrap(10)) +
coord_cartesian(ylim = c(-5, 120)) +
theme_minimal() +
theme(
axis.text.x = element_text(hjust = 0.5, size = 9),
legend.position = "none"
)combined_plot <- (p1 + p2) / (p3 + p4) +
plot_layout(guides = "collect") +
plot_annotation(
title = "National Route Preferences and Success Rates\nin High-Altitude Peak Expeditions (2020-2024)",
subtitle = "Bubble Size and Color Show Attempts, with Success Rates\nfor Top 3 Nations Across Four Most Popular Peaks",
caption = "Source: https://github.com/rfordatascience/tidytuesday",
theme = theme(
plot.title = element_text(face = "bold", size = 18, hjust = 0.2),
plot.subtitle = element_text(size = 14, hjust = 0.2),
plot.caption = element_text(size = 14)
)
) &
theme(
legend.position = "bottom",
legend.box = "horizontal",
legend.title = element_text(size = 14, hjust = 0.5)
)Visualization
Alt text: Bar chart titled “Success Rate of Top 4 Peaks Attempted by all Nations (2020-2024)” showing success rates for Everest (88.9%), Ama Dablam (93.2%), Lhotse (88.9%), and Manaslu (52.6%). Bar height indicates success rate, with colors ranging from yellow (180 attempts) to dark purple (100 attempts) representing the number of attempts. Source: https://github.com/rfordatascience/tidytuesday.
Alt text: Bubble chart titled “National Route Preferences and Success Rates in High-Altitude Peak Expeditions (2020-2024)” showing success rates for top 3 nations across four popular peaks: Everest, Ama Dablam, Lhotse, and Manaslu. Bubbles represent attempts, with size and color indicating the number of attempts (1 to 40), and position showing success rates (0-100%). Routes include N Col-NE Ridge, N Face (Hornbein Couloir), S Col-SE Ridge for Everest; SW Ridge, W Face for Ama Dablam; S Col-W Face, W Face for Lhotse; and NE Face for Manaslu. Nations include USA, China, Nepal, India, and UK. Source: https://github.com/rfordatascience/tidytuesday.
Observation
The combined bubble plots for Everest, Lhotse, Ama Dablam, and Manaslu reveal clear patterns in national route preferences and their corresponding success rates:
Dominant Route Concentration: A significant majority of expedition attempts across these popular peaks are concentrated on a single, well-established route. For instance, the “S Col-SE Ridge” on Everest and the “W Face” on Lhotse are overwhelmingly favored, indicated by large bubbles representing high attempt volumes from nations like the USA, China, and Nepal. This suggests the existence of a ‘standard’ or ‘commercial’ route for each peak.
Marginality of Alternate Routes: While other routes were attempted, their popularity and often their success rates were considerably lower. Smaller bubbles and, at times, lower success percentages for alternative paths (e.g., Everest’s “N Col-NE Ridge”) highlight a strong collective preference for the primary, more established route, with alternatives attracting fewer expeditions.
Strategic Path-Peak Selection: The data strongly indicates that route selection is a critical determinant of expedition success, particularly for top-performing nations. Countries like the USA, China, Nepal, and India consistently favor specific path-peak combinations that demonstrate high success rates. This strategic alignment underscores a pragmatic approach where route choice is a calculated decision to optimize success probabilities.
Confirmation of Inherent Route Bias: The observed patterns in the 2020-2024 dataset reinforce a historical trend where specific routes have emerged as the most reliable. The high success rates for concentrated attempts on routes like Everest’s “S Col-SE Ridge” reflect a continuing bias towards ‘proven’ paths, likely due to well-documented passages, established infrastructure, and accumulated experience, which collectively contribute to a higher probability of success.
Question 2
Do certain Agencies have a higher number of member/personal deaths than others with respect to season/date?
Introduction
This question seeks to look at the correlation between agencies and fatality rate and if there are any important trends, especially with respect to season and date. The findings of this question are interesting because it would raise a lot of further questions regarding why these correlations are seen, if there are specific safety policies or certain weather patterns that could lead to some agencies having a higher fatality rate than others, to name a few.
Approach
For this question, we first created variables for both the total death per expedition, as well as percentage death per expedition for the member or customer, hired staff, and the total percentage. Then, we started with a general graph that looks at Agencies and how many deadly expeditions they had total between 2021 and 2024. Then we zoom in per year and look at the percentage as well as total death by Agency (and by season). Lastly, we create a graph looking at the total actual deaths (raw totals) for each agency in case there are any differences conclusions we can draw with the percentage graph.
Initially, we sought to add all the percentage and raw graphs into one large graph, faceted by year, but quickly realized that this would be unfeasible to do so cleanly as all the y values (or Agencies) would change depending on the year and depending on which had more or less fatality. Thus, we chose to have them as side by side plots on two tabs to easily swap between.
Analysis
agency_by_fatal <- exped_tidy_deadly |>
#graphs agency by number of fatal expeditions. fct_infreq was debugged consulting with AI after looking at documentation. Same with after_stat(count)
ggplot(aes(x = fct_rev(fct_infreq(agency)), fill = after_stat(count))) +
geom_bar() +
#flips coordinates for better readbility of agencies
coord_flip() +
#Increased number of breaks
scale_y_continuous(breaks = c(0, 2, 4, 6, 8, 10)) +
#colored where red is more deadly. I wanted a low intensity representing an increasing intensity so I settled on yellow
scale_fill_gradient(low = "#ffce00", high = "darkred") +
labs(
title = "Number of expeditions through the Himalayas \nthat resulted in death by Agency",
subtitle = "from 2021 - 2024",
caption = "Source: Tidytuesday",
x = NULL,
y = "Number of expeditions that resulted in at least one death",
fill = NULL
) +
theme_minimal() +
#got rid of grid to improve readability
theme(
legend.position = "none",
panel.grid.minor = element_blank(),
panel.grid.major.y = element_blank()
)percent_2021 <- deaths_2021_av |>
#graph by descending percent total deaths by agency. Color is for season
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
#set coordinates for better comparison between groups
coord_flip(ylim = c(0, 1)) +
#colored based on majority season color association
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Death",
title = "Percent total deaths by Agency in 2021",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
#rename x values to have percent
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
#individual labels for each individual percent death rather than average and distingish it between member and hired staff
annotate("text", y = 0.2, x = 1, label = "M 6.6%") +
annotate("text", y = 0.2, x = 2, label = "H 3.3%") +
annotate("text", y = 0.3, x = 3, label = "Trek 1: M 54%") +
annotate("text", y = 0.7, x = 3, label = "Trek 2: H 10%") +
annotate("text", y = 0.3, x = 4, label = "M 20%") +
annotate("text", y = 0.4, x = 5, label = "M 75%") +
theme_minimal() +
#cleaned up grid for better readability of annotations
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)tota_2021 <- deaths_2021_raw |>
#same as above, except with total deaths and not percents
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
#sets coordinates for easier comparison between groups
coord_flip(ylim = c(0, 5)) +
#colored same as above for easy comparison
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Deaths",
title = "Percent total deaths by Agency in 2021",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
#set breaks
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
#instead annotate with number of treks because most deaths were 1 or 2 total so the prior labels are somewhat superfulous and messy
annotate("text", y = 4, x = 5, label = "Total Treks: 2") +
theme_minimal() +
#clean grid for better readability
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)percent_2022 <- deaths_2022_av |>
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 1)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2022",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
annotate("text", y = 0.2, x = 1, label = "H 20%") +
annotate("text", y = 0.2, x = 2, label = "H 4.7%") +
annotate("text", y = 0.2, x = 3, label = "M 7.6%") +
annotate("text", y = 0.3, x = 4, label = "Trek 1: M 6.25%") +
annotate("text", y = 0.7, x = 4, label = "Trek 2: H 10%") +
annotate("text", y = 0.32, x = 4.8, label = "Trek 1: M 14.2%") +
annotate("text", y = 0.32, x = 5.2, label = "Trek 2: H 6.25%") +
annotate("text", y = 0.8, x = 5, label = "Trek 3: H 6.67%") +
annotate("text", y = 0.25, x = 6, label = "M 20%") +
annotate("text", y = 0.3, x = 7, label = "M 14.28%") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)total_2022 <- deaths_2022_raw |>
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 5)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2022",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
annotate("text", y = 4, x = 6, label = "Total Treks: 3") +
annotate("text", y = 3, x = 7, label = "Total Treks: 2") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)percent_2023 <- deaths_2023_av |>
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 1)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2023",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
annotate("text", y = 0.2, x = 1, label = "M 5.8%") +
annotate("text", y = 0.2, x = 2, label = "Trek 1: M 4.7%") +
annotate("text", y = 0.5, x = 2, label = "Trek 2: M 6.7%") +
annotate("text", y = 0.8, x = 2, label = "Trek 3: 6.3%") +
annotate("text", y = 0.2, x = 3, label = "M 10%") +
annotate("text", y = 0.2, x = 4, label = "M 1.66%") +
annotate("text", y = 0.2, x = 5, label = "M 15.3%") +
annotate("text", y = 0.2, x = 6, label = "H 10%") +
annotate("text", y = 0.2, x = 7, label = "M 13.3%") +
annotate("text", y = 0.2, x = 8, label = "M 20%") +
annotate("text", y = 0.23, x = 9, label = "H 25%") +
annotate("text", y = 0.25, x = 10, label = "M 16.7%") +
annotate("text", y = 0.5, x = 11, label = "Trek 1: M 5.3%") +
annotate("text", y = 0.8, x = 11, label = "Trek 2: M 33.3%") +
annotate("text", y = 0.5, x = 12, label = "M 100%") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)total_2023 <- deaths_2023_raw |>
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 5)) +
scale_fill_manual(values = c("orange", "lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Death",
title = "Total deaths by Agency in 2023",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
annotate("text", y = 3, x = 6, label = "Total Treks: 2") +
annotate("text", y = 3, x = 12, label = "Total Treks: 3") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)percent_2024 <- deaths_2024_av |>
ggplot(aes(x = fct_reorder(agency, avg_ptotdeaths, .desc = FALSE), y = avg_ptotdeaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 1)) +
scale_fill_manual(values = "lightgreen") +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Percent Death (Average)",
title = "Percent total deaths by Agency in 2024",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 0.2, 0.4, 0.6, 0.8, 1),
labels = c("0%", "20%", "40%", "60%", "80%", "100%")
) +
annotate("text", y = 0.2, x = 2, label = "M 6.7%") +
annotate("text", y = 0.3, x = 1, label = "Trek 1: M 4.5% H 3.3%") +
annotate("text", y = 0.7, x = 1, label = "Trek 2: H 3.2%") +
annotate("text", y = 0.2, x = 3, label = "M 10%") +
annotate("text", y = 0.2, x = 4, label = "M 11.1% H 2%") +
annotate("text", y = 0.5, x = 5, label = "M 50%") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)total_2024 <- deaths_2024_raw |>
ggplot(aes(x = fct_reorder(agency, total_deaths, .desc = FALSE), y = total_deaths, fill = season_factor)) +
geom_col() +
coord_flip(ylim = c(0, 5)) +
scale_fill_manual(values = c("lightgreen")) +
labs(
fill = "Season",
x = "Trekking Agency",
y = "Total Death",
title = "Total deaths by Agency in 2024",
caption = "Source: TidyTuesday",
subtitle = "M is trekking member death, H is hired staff death"
) +
scale_y_continuous(
breaks = c(0, 1, 2, 3, 4, 5)
) +
annotate("text", y = 4, x = 4, label = "Total Treks: 2") +
theme_minimal() +
theme(
axis.ticks.x = element_blank(),
panel.grid = element_blank()
)Visualization
Alt ID: The horizontal bar graph shows the number of expeditions through the Himalayas that resulted in death by agency from 2021-2024. Seven Summit Treks had by far the most deadly expeditions with approximately 10 expeditions resulting in at least one death (shown in dark red). All other agencies had significantly fewer deadly expeditions, with most having 1-2 expeditions each (shown in yellow/orange). The agencies with deadly expeditions include 7 Summits Adventure, 8K Expeditions, Bejul Adventure, Himalayan Guides, Pioneer Adventure, Satori Adventures, 14 Summits, Annapurna Treks, Asian Trekking, Expedition Himalaya, Glacier Himalaya Treks, High Five Adventures (Pioneer Adventures pvt), Imagine Nepal, Makalu Adventure, Peak Promotion, Pralhad Chapagain (Freelancer at Expes.com), Shangri-La Nepal Treks, Snowy Horizon Treks, TAGnepal Trekking, TAGnepal Trekking (Snowy Horizon pvt), and Yeti Adventure. A total of approximately 22 agencies had at least one deadly expedition during this four-year period.
Alt ID: The bar graph shows the percent total deaths by each trekking agency that had at least one expedition fatality in 2021, with bars color-coded by season (Autumn in orange, Spring in green). The chart distinguishes between trekking member (M) and hired staff (H) deaths. Pralhad Chapagain (Freelancer at Expes.com) had the highest death rate at 75% (Autumn, trekking member). TAGnepal Trekking (Snowy Horizon pvt) follows with 20% (Autumn, trekking member). Seven Summit Treks had two expeditions: Trek 1 with 54% trekking member deaths and Trek 2 with 10% hired staff deaths (both Spring). Other agencies include 7 Summits Adventure with 3.3% hired staff deaths (Spring) and TAGnepal Trekking with 6.6% trekking member deaths (Spring). A total of 5 agencies in 2021 had at least one deadly expedition.
Alt ID: The bar graph shows the total deaths by each trekking agency that had at least one expedition fatality in 2021, with bars color-coded by season (Autumn in orange, Spring in green). The chart distinguishes between trekking member (M) and hired staff (H) deaths. Seven Summit Treks had the highest number of deaths with 2 total treks resulting in approximately 5 deaths (Spring). Pralhad Chapagain (Freelancer at Expes.com) had around 3 deaths (Autumn). TAGnepal Trekking (Snowy Horizon pvt) had approximately 1 death (Autumn). TAGnepal Trekking had around 1 death (Spring), and 7 Summits Adventure had approximately 1 death (Spring). A total of 5 agencies in 2021 had at least one deadly expedition, with the majority of deaths occurring in the Spring season.
Alt ID: The bar graph shows the percent total deaths by each trekking agency that had at least one expedition fatality in 2022, with bars color-coded by season (Autumn in orange, Spring in green). The chart distinguishes between trekking member (M) and hired staff (H) deaths. 7 Summits Adventure had the highest death rate at 20% hired staff deaths (Spring). Shangri-La Nepal Treks follows with 20% trekking member deaths (Autumn). High Five Adventures (Pioneer Adventures pvt) had 14.28% trekking member deaths (Spring). Seven Summit Treks had three expeditions: Trek 1 with 14.2% trekking member deaths (Autumn), Trek 2 with 6.25% hired staff deaths (Autumn), and Trek 3 with 6.67% hired staff deaths (Spring). Satori Adventures had two expeditions: Trek 1 with 6.25% trekking member deaths (Autumn) and Trek 2 with 10% hired staff deaths (Autumn). Pioneer Adventure had 7.6% trekking member deaths (Spring), and Bejul Adventure had 4.7% hired staff deaths (Spring). A total of 7 agencies in 2022 had at least one deadly expedition.
Alt ID: The bar graph shows the total deaths by each trekking agency that had at least one expedition fatality in 2022, with bars color-coded by season (Autumn in orange, Spring in green). The chart distinguishes between trekking member (M) and hired staff (H) deaths. Seven Summit Treks had the highest number of deaths with 3 total treks resulting in approximately 3 deaths (combination of Spring and Autumn seasons). Satori Adventures had 2 total treks with around 2 deaths (Autumn). Shangri-La Nepal Treks had approximately 1 death (Autumn). Pioneer Adventure, High Five Adventures (Pioneer Adventures pvt), Bejul Adventure, and 7 Summits Adventure each had around 1 death (all Spring season). A total of 7 agencies in 2022 had at least one deadly expedition, with deaths occurring in both Spring and Autumn seasons.
Alt ID: The bar graph shows the percent total deaths by each trekking agency that had at least one expedition fatality in 2023, with bars color-coded by season (Autumn in orange, Spring in green). The chart distinguishes between trekking member (M) and hired staff (H) deaths. Notable agencies include Annapurna Treks with a 100% death rate (Spring, trekking member), and Himalayan Guides with three expeditions totaling nearly 39% deaths (Autumn and Spring). Other agencies with high percentages include Peak Promotion (~25%), 14 Summits (~17%), and Asian Trekking (~15%). Seven Summit Treks had three expeditions with death rates between ~4.7% and 6.7%, averaging around 6%. A total of 12 agencies in 2023 had at least one deadly expedition.
Alt ID: The bar graph shows the total number of deaths by trekking agency in 2023, with each bar color-coded by season (Autumn in orange, Spring in green). Deaths are categorized as trekking member (M) or hired staff (H). Seven Summit Treks had the highest number with 5 total deaths across 3 treks (all in Spring). Imagine Nepal and Pioneer Adventure follow with 3 and 2 deaths respectively. Most other agencies had 1 death each. Himalayan Guides recorded 2 deaths across Autumn and Spring. Only three agencies had deaths in Autumn: Himalayan Guides, 14 Summits, and 8K Expeditions. In total, 12 agencies reported at least one fatality in 2023.
Alt ID: The bar graph shows the percent total deaths by trekking agency in 2024, based on expeditions with at least one fatality, all occurring during the Spring season. Deaths are marked as trekking member (M) or hired staff (H). Yeti Adventure had the highest death rate at 50% (M). Other agencies include 8K Expeditions with 11.1% M and 2% H, Snowy Horizon Treks with 10% M, and Makalu Adventure with 6.7% M. Seven Summit Treks had two expeditions with lower death rates: Trek 1 with 4.5% M and 3.3% H, and Trek 2 with 3.2% H. A total of 5 agencies recorded deadly expeditions in Spring 2024.
Alt ID: The bar graph shows the total number of deaths by trekking agency in 2024, with all deaths occurring in the Spring season. Deaths are classified as trekking member (M) or hired staff (H). Yeti Adventure recorded the highest number with 1 death (M). Four other agencies—8K Expeditions, Snowy Horizon Treks, Makalu Adventure, and Seven Summit Treks—each recorded 1 death as well. For 8K Expeditions, the death included both a trekking member (M) and hired staff (H). Seven Summit Treks conducted 2 treks, both resulting in one death (one hired staff each). Overall, 5 agencies had at least one fatality in 2024, all during Spring.
Observations
- Firstly, the agency, Seven Summits Treks, has the most fatal expeditions from 2021 - 2024, with 10 expeditions with at least one fatality, which is significantly higher than the other agencies which only have one or two fatal expeditions.
Unexpectedly, when we looked at the percentage graphs by year, we can see that Seven Summits Treks often has a low or medium percentage fatality.
The actual deaths were relatively similar across the board, with usually only 1 or 2 people dying per expedition.
This likely means that Seven Summit Treks has a larger group per expedition (30 + people) than other agencies so that even though they have relatively similar total deaths per expedition, their percent fatality is much lower than the others with a small group (4 to 5 people)
- However, we cannot conclude that the higher number of fatal expeditions is due to having more expeditions per year because we did not look at all expeditions, only the deadly ones.
- Spring has more fatal expeditions than Autumn from 2021 - 2024.
There is not enough data to conclude whether or not Autumn has a higher fatal percent rate than Spring when an expedition has at least one fatality. (i.e. spring has more fatal expeditions but only one person dies per expedition vs Autumn having less but more people die)
We also cannot conclude whether or not this difference is due to an increased or decreased number of expeditions in each season.
- 2022 and 2023 had more fatal expeditions and more agencies with fatal expeditions than for 2021 and 2024.
- Although there is slightly more member or customer deaths than hired staff deaths from 2021 to 2024 per expedition, but the data is too close to be able to reject the null hypothesis.
Conclusion, Limitations, and Future Directions
Conclusion
Our analysis of 2020-2024 Himalayan expeditions reveals that the overwhelmingly dominant and historically proven routes are consistently preferred and yield the highest success rates. Top-performing nations strategically prioritize these established paths, underscoring route selection as a critical determinant of expedition success. Conversely, alternative routes demonstrate notably lower effectiveness.
Seven Summits Treks has the most fatal expeditions with at least one death irregardless of member or staff death. Spring has more deadly expeditions than Autumn. 2022 and 2023 have more deadly expeditions than 2021 and 2024. There is seemingly slightly more member or customer death and hired staff but the data is inconclusive. Overall, however, although we were able to see certain agencies that had a higher number of expeditions with at least one fatality, this data is not conclusively correlate more Himalayan expedition fatalities to one or a few specific agencies that are more egregious than the rest.
Key Limitations
Our analysis faced two primary limitations:
Data Quality: The initial dataset required significant pre-processing due to inconsistencies, particularly with consolidated route information, impacting granular analysis.
Data Sparsity: A high variable count relative to the number of entries in our filtered dataset limited the ability to draw universally conclusive findings and complex statistical relationships.
For question 2, the data could benefit from a wider reaching question that took in account comparisons between non-fatal and fatal expeditions. Additionally, raw death total graphs were included because certain variables seemed to have high percent death, but was due to an overcompensation of percentage calculations for small groups. Lastly, patchwork was unable to join the graphs which meant that each graph took up more space.
Future Directions
Building on these insights, future research should focus on:
In-depth Dataset Exploration: Leveraging the comprehensive original Himalayan Database (https://www.himalayandatabase.com/hbn2019.html) to conduct more robust analyses and explore broader historical trends.
Expanded Variable Analysis: Investigating a wider range of factors, such as leadership roles, team sizes, and seasonal influences. Additionally, it should take into account group number and non-fatal expeditions as well as other potential correlating variables.
Predictive Modeling: Developing models to forecast expedition success based on various input factors, aiding future planning and risk management.